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A review on anomalous behavior in crime by other researchers is discussed 
in this study that focused specifically on the linkage between anomalous 
behaviors. Next, comprehensive reviews related to gait recognition in utilizing 
machine learning algorithms for detection and recognition of anomalous 
behavior is elaborated too. The review begins with the conventional approach 


of gait recognition that includes feature extraction and classification 





using PCA, OLS, ANN, and SVM. Further, the review focused on utilization 
Keywords: of deep learning namely CNN for anomalous gait behavior detection 
and transfer learning using pre-trained CNNs such as AlexNet, VGG, 


Anomalous behavior and a few more. To the extent of our knowledge, very few studies 


CNN investigated and explored crime related anomalous behavior based on their 
Crime gaits, hence this will be the next study that we will explore. 
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1. INTRODUCTION 

Basically, human has common routine of movements while passing by the places that are related 
or not related to their destination. According to psychologists and behavioral scientists, individual are guided 
in their behavior by specific place and situation [1, 2]. Situations control characteristics and causing 
the individual behaving almost exclusively to the place and situation of where and what them to be [3]. 
The influence is evoked by psychological of individual cognitive functions and eventually the response will 
be automatically decoded through the physiological movement. It can be concluded that both place and 
situation weave together with human nature, psychology, and physiology aspect [4, 5]. This explains 
the changing patterns of movement in accordance with the intent of the subject, especially those who possess 
distinct intention. Thus, anomalous behavior can be obtained from human movement if human have different 
thought from the purpose of the place and the way human responded towards the situation at that instant. 

There are two reasons towards anomalous behavior in crime from the psychological and 
physiological point of view. Low self-control and opportunity are the two main elements stated in the general 
theory of crime that urge criminal activity [6]. Opportunity element is indeed true based on finding that 
showed residences without surveillance system were six times higher in risk as victims of housebreaking 
crimes [7]. Meanwhile, the social and economic conditions of perpetrator are the root cause of low 
self-control element that increases the tendency to commit crime. This statement is supported by numerous 
studies by listing the dominant factors influencing low self-controllers to commit crimes including high 
cost of living, poor family condition, financial difficulties and unemployment [8, 9]. People in need are 
psychologically inclined to see crime as a way out to meet their needs and this set of mentality can lead 
to actual crime. 
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Towards crime committing, the perpetrators unconsciously drawn information from cognitive 
psychology to simulate the imitation of familiar modus operandi [10, 11] and this will be interpreted 
automatically through physiological reactions affiliated with the autonomic nervous system [12, 13]. 
Imitation can be developed through observation or instructed by crime accomplice [14]. Thus, behavior 
of perpetrator depends on the degree of imitation with slight differences due to reinforcement or similar 
behaviors reproduced from one to another perpetrator for each crime. The formation of criminal behavior 
pattern for respective crimes can be observed at this phase. 

It is impossible to enumerate all the anomalous behaviors in the real world. Therefore, the anomalous 
behavior in the epistemology of the crime is described as the behavior that deviate from its common state with 
the intention to threat human property, life, and freedom. This definition is permissible but shall not recklessly 
accept without proper understanding on the vagueness between the traits of normal and anomalous behaviors 
considering the ambiguity is slightly small [15]. 


2. PREVIOUS WORKS ON RECOGNITION OF ANOMALOUS BEHAVIOR USING MACHINE 
LEARNING 
Machine learning algorithms have proven suitable for classification purpose as compared to 
conventional algorithms. Studies in detection of anomalous behavior have been accelerated as a domino 
effect of the development in gait recognition using machine learning. Various algorithms have been 
developed and evaluated for the purpose of higher accuracy as well as for rapid detection of anomalous 
behaviors or activity. 


2.1. Traditional machine learning algorithms and anomalous behavior 

Conventionally, two methods are essential for machine learning algorithms to achieve better 
performance namely feature extraction and classification. Feature extraction are the process of determining 
the significant features of the image and classification is related to determining the optimum tuning 
parameters of the classifier in order to generate high accuracy during detection. Prior to these two stages, 
pre-processing is almost mandatory for each image of video sequence to generate the region of interest for 
feature extraction and classification process. The detection of moving region of an image can be achieved 
using several methods, for instance background subtraction, statistical method, temporal differencing, 
and optical flow [16] along with feature optimization using shadow detection, image morphological such 
as erosion and dilation and few more [17]. 


2.1.1. Feature extraction method 

Machine learning algorithms viewed an image as an object that made up of thousands or even millions 
of pixels. Mathematically, these millions numerical values of the pixels are represented as feature vectors 
to enable machine learning to understand the pattern of the image. Dimensionality reduction algorithms help 
to eliminate the redundancy in feature vectors of the image and preserve the significant features [18]. 
Gait recognition equally experienced the high dimensional challenges but it is temperate as there are many 
dimensionality reduction theorems for instance independent component analysis (ICA), linear discriminant 
analysis (LDA), orthogonal least squares (OLS) and principal component analysis (PCA). 

An important procedure of feature extraction in gait recognition is to extract the subject silhouette 
of the images through background subtraction method. For instance, images of subjects walking in various 
environments from two gait databases were employed to perform gait recognition on human silhouette using 
ICA and nearest neighbor classifier (NN) [19]. ICA projects the silhouette to a lower dimensional space 
to identify components that are statistically independent or independent components (ICs). Classification was 
performed with NN along with ICs from two databases, 90 ICs of MUD dataset and 300 ICs of NLPR 
dataset. Recognition rate of 100% and 95% were obtained for the MUD and NLPR dataset, respectively. 
In addition, one study employed OLS as feature extraction for gait analysis that evaluated and validated 
the ability of OLS using skeleton joints coordinate of Kinect sensor and achieved good performance with 
95.33% recognition accuracy upon reducing number of skeleton joints from 60 to 28 joints [20]. 

On the other hand, PCA is one of the most preferred algorithm for dimensionality reduction in gait 
community. Study of gait in identifying health of runners has retained the principal components (PCs) 
with at least 80% cumulative percentage of total variance that owned eigenvalues greater than one. 
In 2015, Phinyomark et al. extracted 900 features from three planes, i.e. frontal, transverse and sagittal, 
and three joints, i.e. ankle, knee and hip. The retained PCs for each planes are four PCs [21]. Study on ground 
reflex pressure signal of sensing shoes applied PCA and kernel PCA (KPCA) in selecting the best features 
of a walking pattern. Similar to earlier study, the retained PCs have eigenvalues greater than one but higher 
cumulative percentage is selected that is up to 95%. KPCA has outperformed PCA in selecting the significant 
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features but KPCA requires more runtime due to higher dimensional space requirement as reported in [22]. 
Further, study on walker-assisted gait has retained the PCs in line with these three characteristics viz. 
PCs with eigenvalues greater than one, 60% to 70% cumulative percentage and variation of data based on 
scree plot analysis. Thus, four PCs were selected from 31 features of gait variables [23]. 

Anomalous gait behavior study has used PCA with the ability to determine the best components 
of data and multiple discriminant analysis (MDA) with the ability to find the best separability between 
the data in determining the feature vectors of each image in gait energy image (GET) gallery. The threshold 
value for individual recognition is determined and if the value of the distance between the feature vectors 
in GEI and the detected human is less than the threshold value, alarm will be activated [24]. 


2.1.2. Classification method 

Gait recognition involves huge data size with many features. It is important to select suitable 
classifiers that can handle this database. Among many classifier algorithms, artificial neural network (ANN) 
and support vector machine (SVM) are popular in gait recognition due to the ability of both algorithms 
in handling nonlinear relationship between the data. A 3-layer multi-layer perceptron (MPL) of ANN was 
utilized in distinguishing the silhouette pattern for specific activity such as sitting, bending, walking, running 
or ‘No activity’ [17]. Each frame of video was transformed into a single dimensional vector of a binary 
human silhouette using several methods of motion segmentation namely background subtraction, shadow 
detection and morphological process. Distance vector and motion vector were the features extracted from 
the human activities. Then, the distance vector was fed into ANN as inputs for the first network layer and 
the motion vector as the second input. These layers have 10 hidden neurons with hyperbolic tangent sigmoid 
as activation function. The output layer classified the pattern according to the value of each activity. Results 
showed that ‘No activity’ recorded highest accuracy of 99% while running was the lowest accuracy 
specifically 87%. 

Furthermore, anomalous behavior can be identified using angular and linear displacement during 
body movement [16]. These displacements were calculated from the body joints to the body centroid that 
indicated the length of body being enlarged. Next, a single hidden layer MLP with scale conjugate gradient 
training method and sigmoid activation function was used as the classifier. The normal behavior and anomaly 
behavior were denoted as ‘zero’ and ‘one’ respectively in supervise learning process and the threshold value 
was fixed at 0.5. The output was classified as normal behavior if the value of detected images is equal 
or more than 0.5 and otherwise if the anomalous behavior was identified if the value less than threshold 
value. Walking and struggling were accurately recognized but running behavior only attained 
89.3% accuracy. Moreover, specific human behavior can be estimated from part of the body. A basic action 
as pointing can be recognized using elbow vector by ANN classifier [25]. Six body joints were selected from 
upper limb of Kinect body structure including neck, shoulder, elbow, and hip. The angle of each body joint 
was given a specific range according to the pointing behavior. The ANN classifier with six neurons 
in the input and hidden layer and 4 neurons in the output layer with training parameters of learning rate was 
0.01 and momentum was 0.8 acted as the classifier for this study. 

Furthermore studies on the anomalous gait behavior using SVM are more diverse than ANN such 
as fall detection, joint pain, autism, Parkinson disease and many more. Fall detection in one of the previous 
study that based on depth images and skeletal data points of Kinect with radial basis function (RBF) SVM 
cross validation achieved detection rate of 98.33%. Changes in human body height were analyzed by 
calculating velocity of the head and height of certain body joints to the floor [26]. Additionally, a multi-class 
SVM with one-against-one (OAO) method was applied to classify five human behaviors in order to detect 
fall. Movements can change the shape of the human body, the horizontal and vertical length was calculated 
from projection histogram of 2D binary silhouette images. Head was considered as the highest point 
of the silhouette and the vertical threshold was determined according to head position to anomalous activities 
were stumble and limp. Dataset was collected from 48 subjects, 20 to 30 years old. Each subjects repeated 
five times for each activity during image acquisition at the experimental venue. OAO and RBF kernel 
monitor the changes in height. Walking, running, bending, sitting were the normal activities and showed 
the best sensitivity and specificity of three kernels function used for classification, i.e. dot, s done that 
utilized four kernels of SVM, linear, quadratic, polynomial and RBF [27]. The recognition rate of RBF kernel 
yet again was the highest at 96.8%. 

On the other hand, gait of healthy people, people in pain or people with certain disease 
is investigated by Shetty and Rao. In this study, Parkinson disease (PD) patients, Amyotrophic lateral 
sclerosis (ALS) and Huntington disease (HD) patients as well as healthy controls are classified accordingly 
based on the leg movements. Data acquisition was gathered by force measurements device consists 
of numerous stride of gait cycles. Feature extraction was applied and 12 related features were extracted 
and classification was done using RBF SVM with both generalization parameters C and gamma value set as 1. 
Six out of eight PD patients are accurately classified [28]. In addition, it is well known that autism spectrum 
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disorder (ASD) patients usually demonstrate stereotype behaviors. A study on ASD were listed as the three 
most frequent behaviors among ASD patients namely body rocking, hand flapping and top spinning. 
This study is employed SVM and hidden Markov model (HMM) to classify these behaviors from data joint 
of Kinect RGB-D images. 10-folds cross validation was applied for both HMM and SVM algorithms. 
SVM classifiers were used two types of kernel function, RBF and polynomial, meanwhile, HMM classifiers 
were used diagonal and spherical covariance. SVM RBF classifier had the best results especially in classified 
the top spinning behavior but HMM with diagonal covariance obtained better results in identified all 
behaviors of ADS with perfect 100% for body rocking and top spinning [29]. 


2.2. Deep learning and anomalous behavior 

CNN is a typical deep learning structure for gait recognition and object detection due to its great 
achievement. The ability of CNN in learning automatically the useful features of large input data with each 
pre-defined size filter in the convolution layers slides along the images with uniform stride to produce 
a convolution map [30]. The pooling layer rescales the map and reduced the feature matrix of convolution maps 
to minimize the redundant pixels [31]. The optimization function of stochastic gradient descent with momentum 
(SGDM) and rectified linear unit (ReLU) activation function are often utilized during learning process 
in convolution layer. Then, second convolution layer learns from preserved feature vectors of the first convolution 
maps and the process continues as the subset of features started to connect with each other towards fully connected 
and classification output layer [32]. CNN decreased numbers of parameters in steps; number of connections 
is reduced during convolution process and number of shared weights is reduced during pooling process [33]. 

Visual tracking has challenged CNN to track human in variation of pose, viewpoints or occlusion [34]. 
Ten challenging datasets comprised of variation in illumination, scale, resolution, deformation, occlusion 
and background were selected to evaluate the performance of a CNN on a humble hardware platform. 
Five layers CNN was developed comprised of convolutional, pooling, normalization, fully-connected 
and softmax layer. Convolutional layer used 50 filter banks with size of 4x4xk channels, two strides and zero 
padding. Pooling layer was set with max operator, filter size of 2x2, two strides and zero padding. 
Normalization layer with four pre-defined hyperparameters, k=1, o=1/4, p=2 and B=0.5. Then, fully-connected 
layer was flattened with all the features extracted and connected to the softmax node. Softmax operator 
evaluated the logloss to classify the output. The training hyperparameters were fixed for the whole process, 
maximum epoch number as 5, learning rate of 0.001 and the batch size of 10. The CNN was tested for tracking 
ability of occlusion variation using women dataset since the dataset provided variation of pose for partial 
occlusion of both upper and lower limb. Variation of body deformation was tested using video of basketball 
game that contained many scenes of deformation. The tracking results were based on center location error 
which the average of Euclidean distance between tracked person and the ground-truth positions of the frames. 
The average percentage for basketball dataset is 91.31% and 94.14% for women dataset. 

Three anomalous behaviors involving violent activities were studied using six layers CNN of three 
convolution layers, two fully connected layers and one softmax layer [35]. The size of filters, number of filters, 
convolution stride and padding for all three layers of convolution, pooling and ReLU were fixed. Max operator 
was used in pooling layers. First fully-connected layers with 64 output neurons and the output neurons 
of second fully-connected layer were two for Experiment | and six for Experiment 2. A ReLU layer was added 
between the two fully-connected layers and the probability of each category was calculated in the softmax loss 
layer. The optimization function of the network used was stochastic gradient descent with momentum (SGDM). 
The epoch was tuned from 10 to 100 and the learning rate ranging from 10° to 10°'. CNN tested images 
of normal and anomalous behaviors using five datasets with variation of data splitting ranging between 
70% training and 30% testing except for PEL dataset with 43% training and 57% testing since the dataset 
mainly contains of fighting scenes from movies. These datasets were employed to classify the anomalous gait 
behavior under two experiments; first was to identify the normal and anomalous behavior of each dataset and 
second was to concentrate on three anomalous behaviors namely punching, kicking and pushing. Higher epoch 
gave better accuracy although more time taken with learning rate of 0.001 stabled and achieved high accuracy. 
All datasets successfully detected anomalous behaviors with 100% accuracy for both experiments using CNN. 


2.3. Transfer learning and anomalous behavior 

Recently, a new type of learning with the ability to yield better results, faster training process 
and allows smaller input data was introduced that is known as transfer learning by exploiting the pre-trained 
CNNs [36, 37]. Technically, transfer learning offers to leverage the knowledge of pre-trained CNNs that 
previously learned on enormous dataset and utilize it to learn new, related or even carry out assignment 
in different domain [37-41]. It also solves CNN issues that requires huge input data, long duration of training 
process and demands high time consumption in formulating the finest architecture for obtaining high 
accuracy in detection [37, 39]. 
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Two strategies in implementing transfer learning algorithm over pre-trained CNNs are feature 
extractor and fine-tuning [36, 42, 43]. The layers of convolution networks for image classification commonly 
characterize two sections, (i) convolution base and (ii) dense base with both strategies using convolution base 
of pre-trained CNNs to acquire the reusable knowledge of learned weights. Meanwhile, the dense base 
generally replaced the layers or/and fine-tuned the hyperparameters as it holds specific knowledge 
of previous assignment that was unbeneficial to current assignment [42, 43]. 

The advantage of pre-trained CNNs is the network architecture that has been trained with millions 
RGB images especially the convolutional layers that hold the unique methods of feature extraction 
of the pre-trained CNNs [44]. Hand-crafted features extraction algorithms are far behind this architectural 
network [33]. Most studies used pre-trained CNNs as feature extractor with shallow classifier to recognize 
human anomalies in face, gender, handwriting, pedestrian walking in wrong direction and vehicle in 
pedestrian walkway [38, 39, 45]. 

On the contrary, a human skeletal sequence was developed using double branch VGG-16. 
First stage, VGG-16 was served as feature extractor to create feature maps. The second stage; two VGG-16 
networks were performed as joint detection and joint connection. The predictor took advantage 
of convolution process to perform better prediction of joints. The connector was calculated based 
on the connection of each joint according to confidence score of feature maps and maximizes the weight 
score of selected edge. The information of skeleton sequence maps were employed by multi-class polynomial 
SVM to train and test seven general behaviors namely bend, sit, squat, stand, run, walk and wave. 
Each behavior have 1000 image for testing. The recognition time for all behaviors was around 0.1 seconds 
and the highest recognition rate was squatting [46]. 

To the extent of our knowledge, studies of anomalous behavior related to crime using classifier namely 
SVM and Alexnet was by Xu et al. that investigated violent behaviors such as robbery, smashing car and street 
fighting can be detected using AlexNet and SVM as classifier. In this study, Alexnet was exploited to perform 
the feature selection, feature extraction and dimensionality reduction of two datasets with images of bend, jump, 
skip, run, wave, clap, etc. Multi-class linear SVM was employed as classifier to recognize the violent behaviors. 
Learning process took 30% data for training and the rest were used for validation. This network attained 
accuracy close to 100% and it was implemented with intelligent video surveillance at the parking lot 
to recognize the violent behaviors. The real-time recognition of violent behavior was good [33]. Other studies 
have observed that anomalous gait behavior is associated with the place particularly in crime category. 
Squatting and peeping are suggested as anomalous gait behavior at the ATMs vicinity [47, 48] and 
aggressiveness is an indicator for detecting anomalous gait behavior in elevator [49]. 


3. CONCLUSION 

In conclusion, most studies of gait recognition focus on optimization of recognition method. 
Few studies were concerned on the motion of anomalous behavior and very few studies found on the 
anomalous behavior of crime category. In addition, anomalous behavior detection extremely required suitable 
method of gait recognition. SVM and ANN are widely used by the gait recognition community for 
classification and recognition of anomaly detection. Recently, CNN has offered as one of the method in 
classification with positive results. Therefore, researchers are exploring and developing algorithms to achieve 
better accuracy in detecting human behavior using CNN. Another potential area to be explored is anomalous 
behavior related to crime. Not many studies are done in this area and this will be the scope that we will 
explore next that involves investigating anomalous behavior related to crime using suitable classifier namely 
deep learning neural network. 
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